Use RFC3986 instead of manual string parsing #434
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR moves from using
urllib2/urlparsetorfc3986to perform URI handling. Parsed URI objects are now handled internally (and exposed from the publicRefResolverattributes, but strings may be passed in to theRefResolverpublic API as before. Some tests needed to be changed to reflect the more strict of URIs (such as the base URI must be absolute, or a URI reference with an empty fragment). Furthermore, an empty-string URI is not valid by the above rule, so a null URN is used. This probably should be a locatable URI (e.g file) rather than a URN upon reflection.Originally it was discussed (#346) that
hyperlinkwould be the candidate library for better handling of URIs. After implementing support forhyperlink, it slows the test-suite by approximately 8-9 X (from 1-2s to 17-18s). I then chose to use therfc3986library, and then implemented a new branch which replaces therfc3986api with the largely similar one ofhyperlink, just so you can actually run the test suite with bothhyperlinkandrfc3986. It may well be that I've missed something related to caching that I'm not aware of, which explains why thehyperlinkimplementation is so slow.The minor differences between the
rfc3986_patchand therfc_to_hyperbranches, besides the different APIs, are partly thathyperlinkdoesn't support resolving against rootless URIs, so a rooted default URI has to be used. Also, therfc3986parsed URI object defines a method for string comparison, whereashyperlinkdoes not.Of the libraries investigated.
hyperlinkslow, immutablefurlslow, muteablerfc3987immutable, not tested (doesn't implement normalisation yet)rfc3896fastest and immutable objects